Designing a Bilingual Speech Corpus for French and German Language Learners: a Two-Step Process

نویسندگان

  • Camille Fauth
  • Anne Bonneau
  • Frank Zimmerer
  • Jürgen Trouvain
  • Bistra Andreeva
  • Vincent Colotte
  • Dominique Fohr
  • Denis Jouvet
  • Jeanin Jügler
  • Yves Laprie
  • Odile Mella
  • Bernd Möbius
چکیده

We present the design of a corpus of native and non-native speech for the language pair French-German, with a special emphasis on phonetic and prosodic aspects. To our knowledge there is no suitable corpus, in terms of size and coverage, currently available for the target language pair. To select the target L1-L2 interference phenomena we prepare a small preliminary corpus (corpus1), which is analyzed for coverage and cross-checked jointly by French and German experts. Based on this analysis, target phenomena on the phonetic and phonological level are selected on the basis of the expected degree of deviation from the native performance and the frequency of occurrence. 14 speakers performed both L2 (either French or German) and L1 material (either German or French). This allowed us to test, recordings duration, recordings material, the performance of our automatic aligner software. Then, we built corpus2 taking into account what we learned about corpus1. The aims are the same but we adapted speech material to avoid too long recording sessions. 100 speakers will be recorded. The corpus (corpus1 and corpus2) will be prepared as a searchable database, available for the scientific community after completion of

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The IFCASL Corpus of French and German Non-native and Native Read Speech

The IFCASL corpus is a French-German bilingual phonetic learner corpus designed, recorded and annotated in a project on individualized feedback in computer-assisted spoken language learning. The motivation for setting up this corpus was that there is no phonetically annotated and segmented corpus for this language pair of comparable of size and coverage. In contrast to most learner corpora, the...

متن کامل

Inter-annotator agreement for a speech corpus pronounced by French and German language learners

This paper presents the results of an investigation of interannotator agreement for the non-native and native French part of the IFCASL corpus. This large bilingual speech corpus for French and German language learners was manually annotated by several annotators. This manual annotation is the starting point which will be used both to improve the automatic segmentation algorithms and derive dia...

متن کامل

French Learners Audio Corpus of German Speech (FLACGS)

The French Learners Audio Corpus of German Speech (FLACGS) was created to compare German speech production of German native speakers (GG) and French learners of German (FG) across three speech production tasks of increasing production complexity: repetition, reading and picture description. 40 speakers, 20 GG and 20 FG performed each of the three tasks, which in total leads to approximately 7h ...

متن کامل

Role of Monolingualism/Bilingualism on Pragmatic Awareness and Production of Apology Speech Act of English as a Second and Third Language

The present study investigated the pragmatic awareness and production of Iranian Turkish and Persian EFL learners in the speech act of apology. Sixty-eight learners of English studying at several universities in Iran were selected based on simple random sampling as the monolingual and bilingual participants. Data were elicited by means of a written discourse self-assessment/completion test (WDS...

متن کامل

Addressing Code-Switching in French/Algerian Arabic Speech

This study focuses on code-switching (CS) in French/Algerian Arabic bilingual communities and investigates how speech technologies, such as automatic data partitioning, language identification and automatic speech recognition (ASR) can serve to analyze and classify this type of bilingual speech. A preliminary study carried out using a corpus of Maghrebian broadcast data revealed a relatively hi...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014